feat: v0.1.0 release polish - critical and major fixes by spignotti · Pull Request #4 · spignotti/litresearch

spignotti · 2026-03-23T15:10:50Z

Summary

This PR implements all critical and major fixes from FEATURE.md to prepare litresearch for v0.1.0 publication.

Critical Fixes

JSON parsing error handling - Wrapped json.loads() in both _screen_paper and _analyze_paper with try/except JSONDecodeError. Returns None and prints warning on malformed LLM responses instead of crashing.
Semantic Scholar timeout/retry - Added s2_timeout config setting (default 10s). S2 client is now created with timeout=settings.s2_timeout, retry=False in both discovery and enrichment stages to prevent 14-minute hangs.
PDF double-download prevention - PDFs are now saved during analysis stage to papers/ directory and marked pdf_downloaded=True. Export stage skips already-downloaded papers.

Major Fixes

Immutable Settings construction - Refactored _build_settings() to use Settings(**overrides) pattern instead of post-init mutation.
Output directory collision handling - Added --overwrite flag and auto-increment logic. When output directory exists and is populated, automatically uses output-2, output-3, etc.
No-abstract paper handling - Papers without abstracts now get a ScreeningResult with relevance_score=0 and rationale="no abstract available" instead of being silently skipped.
LLMError handling in query_gen - Wrapped call_llm() in try/except with clear error message on failure.
Config file hygiene - Renamed litresearch.toml to litresearch.toml.example (already in .gitignore).
HTML entity unescaping - Applied html.unescape() to title, venue, and abstract fields in Paper.from_s2().
Stage-level test coverage - Added 3 new test files with comprehensive coverage:
- test_stages_query_gen.py - query generation and error handling
- test_stages_screening.py - screening behavior and no-abstract handling
- test_stages_discovery.py - S2 client config and deduplication

Minor Polish

Added BATCH_SIZE comment documenting S2 batch endpoint limit
Added run summary block showing timing and counts at pipeline completion
Changed screening_threshold default from 40 to 60 with documentation

Validation

All nox sessions pass:

✅ lint (ruff check + format)
✅ typecheck (pyright, 0 errors)
✅ test (23 tests passed)

Commits

fix: critical issues - JSON parsing, S2 timeout, PDF deduplication
fix(cli): immutable settings construction and output collision handling
fix: handle no-abstract papers and LLMError in query_gen
fix: rename litresearch.toml to example and unescape HTML entities
test: add stage-level tests for query_gen, screening, discovery
chore: minor polish - comments, summary output, threshold default

- Guard json.loads() in analysis.py with try/except JSONDecodeError - Add s2_timeout config setting (default 10s) with retry=False for S2 client - Prevent PDF double-download by saving during analysis and marking pdf_downloaded - Skip already-downloaded PDFs in export stage

- Refactor _build_settings to use immutable Settings(**overrides) pattern - Add --overwrite flag to run command - Auto-increment output directory name when directory exists and is populated - Add tests for collision detection and overwrite behavior

- Write ScreeningResult with score=0 for papers without abstract - Wrap call_llm in try/except LLMError in query_gen with clear error message

- Rename litresearch.toml to litresearch.toml.example (git mv) - Add html.unescape() for title, abstract, venue in Paper.from_s2()

- Test query generation with successful LLM response and error handling - Test screening behavior for no-abstract papers and JSON parse failures - Test discovery S2 client configuration and paper deduplication

- Add comment for BATCH_SIZE in enrichment.py - Add run summary block in pipeline.py with timing and counts - Change screening_threshold default from 40 to 60 with documentation

spignotti added 6 commits March 23, 2026 16:00

fix: handle no-abstract papers and LLMError in query_gen

993589f

- Write ScreeningResult with score=0 for papers without abstract - Wrap call_llm in try/except LLMError in query_gen with clear error message

fix: rename litresearch.toml to example and unescape HTML entities

34e415d

- Rename litresearch.toml to litresearch.toml.example (git mv) - Add html.unescape() for title, abstract, venue in Paper.from_s2()

test: add stage-level tests for query_gen, screening, discovery

ac9f899

- Test query generation with successful LLM response and error handling - Test screening behavior for no-abstract papers and JSON parse failures - Test discovery S2 client configuration and paper deduplication

chore: minor polish - comments, summary output, threshold default

42f744f

- Add comment for BATCH_SIZE in enrichment.py - Add run summary block in pipeline.py with timing and counts - Change screening_threshold default from 40 to 60 with documentation

spignotti merged commit 7fd0ebd into main Mar 23, 2026
2 checks passed

spignotti deleted the feat/v0.1.0-polish branch March 23, 2026 15:12

spignotti mentioned this pull request Mar 23, 2026

fix(s2): enforce 1 rps throttling across S2 stages #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v0.1.0 release polish - critical and major fixes#4

feat: v0.1.0 release polish - critical and major fixes#4
spignotti merged 6 commits intomainfrom
feat/v0.1.0-polish

spignotti commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spignotti commented Mar 23, 2026

Summary

Critical Fixes

Major Fixes

Minor Polish

Validation

Commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant